home *** CD-ROM | disk | FTP | other *** search
- # This program generates a set of HTML documents corresponding to a text document
- # (one for each chapter and a table of contents, as explained below).
-
- # If your document contains illustrations, you must have created the document with
- # EnterAct 3.7 or later (which places all PICTs in the resource fork and numbers
- # them 1000, 1001, 1002...1400). More exactly, if you have an old EnterAct document
- # you must insert or delete at least one picture using EnterAct 3.7 or later, in
- # order to force renumbering of the pictures. If you're not sure, use a resource
- # editor to check the numbering of the PICTs in the doc's resource fork.
-
- # An overview of the whole process, using "EnterAct 3 Manual" as an example:
- # • get a copy of "clip2gif" by Yves Piguet (check your CD's, or Anarchie to Info-Mac)
- # • structure your document ("EnterAct 3 Manual"), using EnterAct if you have PICT illustrations
- # (more on this important step below)
- # • create a folder to hold the html version of your document, and within it
- # create a folder to hold the gifs ("Disk:...:E3M_HTML:Graphics:")
- # • run the script "PICT rsrcs to numbered gifs" from within EnterAct: in the
- # first dialog, select the document to convert ("EnterAct 3 Manual"); in the second
- # dialog, select the folder to hold the gifs ("Graphics"). NOTE to avoid being pestered
- # about where "clip2gif" is, you should open the script using Apple's "Script Editor" first,
- # and then after you have relocated "clip2gif" (you will be asked to do so by Script Editor)
- # do a Save.
- # • when "PICT rsrcs to numbered gifs" is done, it presents you with a command
- # line to run this program: select it and hit <enter> to run it. You should also
- # save the command line(s) away somewhere for future reference. It looks like this:
- # hAWK -f$TextToHTML -vgifList="Disk:CW CEDAR:E3M_HTML:Graphics:Unsorted gif list"
- # -- "Disk:CW CEDAR:EnterAct Stuff:Documentation:EnterAct 3 Manual"
- # (it will be stored in the file "$tempScriptResult" until you run your next script with EnterAct)
- # • when this program is done, there will be a "Contents.html" file at the top
- # of the folder that holds the html version of your document, and also
- # within it will be a "Text" folder holding the chapter documents, and
- # your folder for the gifs will be chock full of gifs.
- # • drag the "Contents.html" file onto your favourite browser and verify that
- # things came out the way you wanted.
-
- # If your document contains no illustrations then you don't need to run the
- # "PICT rsrcs to numbered gifs" script first. But you still need to construct
- # a command line to run this program. First create a folder to hold the results
- # (eg "Disk:...:MyDoc HTML:"). Remember the full path names for this folder and for
- # your source document ("MyDoc"). Your command line should then look like this:
- # hAWK -f$TextToHTML -vgifList="Disk:...:MyDoc HTML:Graphics:Unsorted gif list"
- # -- "Disk:...:MyDoc"
- # Note that neither the "Graphics" folder nor the file "Unsorted gif list" will
- # exist, but you still need to include them in the command line argument. This
- # is a minor nuisance, but if you ever decide to include illustrations in your
- # document then all you have to do is create the actual "Graphics" folder within
- # the "MyDoc HTML" folder, run the script "PICT rsrcs to numbered gifs", and
- # then run the exact same command line that you produced above (or run the
- # command line that the script produces, which will be exactly the same).
-
-
- # The only tricky bit here is structuring your document so that this program can
- # convert the structure into proper html formatting. Characters in your text have to
- # clearly signal where headings are, where lists start, and so on. This program is
- # set up to handle a specific fixed structure, and if your document doesn't follow
- # this structure you'll have to change either your document or this program,
- # whichever seems easier.
-
- # The "EnterAct 3 Manual" is an example of a document structured for use with
- # this program, browse through it as you read the rules below.
-
- # Here are the structuring rules that this program expects:
- # • First line: presumed to be the title, skip it (the title is instead taken from
- # the name of the document)
- # • Each chapter title should be between "dash-space" lines - - - - -,
- # that is one dash-space line before the chapter title, and another
- # dash-space line immediately after. The "dash space" line should begin
- # with a dash '-', and the line should contain only dashes and spaces in
- # any mix you like after the starting dash, at least one additional character
- # after the starting dash.
- # • If you have a table of contents in your document, it should be preceded with
- # the chapter title "CONTENTS" (between dash-space lines). The entire table
- # of contents will be skipped, and regenerated by this program. Note your
- # table of contents should be followed by a chapter title, since everything
- # from the "CONTENTS" chapter title to the next chapter title will be skipped.
- # • Within each chapter, major subheadings that should be included in the main
- # table of contents should be formatted "§\tSub section name". Subheadings
- # that should NOT be shown in the main table of contents but SHOULD be shown
- # in the table of contents for the chapter itself should be formatted
- # "(§)\tSub section name".
- # • Subheadings that should not be shown in any table of contents should take
- # the form ">\tSub section name".
- # • Any illustrations (PICT) must have been inserted using EnterAct, and you must
- # have inserted or deleted at least one PICT using v3.7 or later of EnterAct.
- # • Major lists begin with "\t•". Sublists begin with "\t\t•". There should be no
- # blank lines within a list, since a blank line signals the end of the list
- # (including any sublists).
- # • Chunks of text that are to be quoted as-is with no reformatting should be
- # between lines that consist of exactly four underscores "____".
-
- # If you wish to modify the structure that this program uses, you'll need to change
- # the function "DoTheLines()" below. Note that structuring information is expected
- # to come BEFORE the text to be formatted (in chapter title the dash-space line comes
- # before and after the title, but the one after the title is just for the sake of
- # appearance in the non-html version). If you want any structuring information to
- # come AFTER the text (such as indicating a chapter title by just a dash-space line
- # following the title, no dash-space line before it) then you will have to buffer
- # the lines as they are read in from your document, since you won't know that
- # formatting is required until you've looked at the line after the one that needs
- # to be formatted.
- # For some help handling this, see
- # «hAWK User’s Manual» «R 2 Beyond input records»
- # especially the part a couple of pages in, about "End–buffered input".
-
- # TOC and anchor handling:
- # Given "Name of Heading" and heading level H2 or H3:
- # 1 replace with <Hn><A NAME = "Name of Heading" >Name of Heading</A></Hn>
- # 2 if H2 head, increment h2Counter
- # 3 accumulate headings in TOC[h2Counter], with SUBSEP in between
- # 4 at end split each array entry, output URLs in unordered lists for TOC.
-
- BEGIN {
- InitSpecialCharacters();
- # Remember full name of main document.
- inputFile = ARGV[1];
- # Get title and main head from name of document.
- n = split(inputFile, names, ":");
- theMainTitle = names[n];
- theMainHead = names[n];
-
- contentsMarker = "TOC GOES HERE";
-
- inList = 0;
- listElement = 0;
- h2Counter = 0;
- newParagraphComing = 1;
-
- currentGIF = 0;
- numGIFs = 0;
- doingGIF = 0;
- doingAsIs = 0;
-
- # "gifList", full path to Unsorted gif list, should be preset
- # in dialog or on the command line.
- gifArrayFile = gifList;
-
- gifPartialLocation = "../Graphics/"; ##see just below
- outFile = ""
- contentsFileName = "Contents.html";
-
- n = split(gifArrayFile, names, ":");
- for (i = 1; i < n - 1; ++i)
- {
- chaptersFolder = chaptersFolder names[i] ":";
- }
-
-
- graphicsFolderName = names[i];
- gifPartialLocation = "../" graphicsFolderName "/"; # eg "../Graphics/"
-
- contentsFileLocation = chaptersFolder;
- chaptersFolder = chaptersFolder "Text:";
- ## eg chaptersFolder = STDPATH "E3M_HTML:Text:"
-
- MakeFolder(chaptersFolder);
-
- LoadGIFNames();
-
- # The main event: inhale all the lines, write the chapters, write main contents.
- DoTheLines();
- WriteMainContentsFile();
-
- # Notify we're done.
- print "HTML conversion of", theMainTitle, "complete.";
- }
-
- function InitSpecialCharacters()
- {
- ampersand = "\\&";
- lessThan = "\\<";
- greaterThan = "\\>";
- quote = "\\"";
- euroLeft = "\\«";
- euroRight = "\\»";
- bullet = "\\·";
- dornk = "\\¬";
- section = "\\§";
- para = "\\¶";
- cedillaC = "\\ç";
- shy = "\\";
- copyright = "\\©";
- registration = "\\®";
- question = "\\?";
- }
-
- # do for all input lines
- function DoTheLines()
- {
- getline < inputFile; # skip the first line
- while ((getline < inputFile) > 0)
- {
- # Stop skipping blank lines if we were doing a GIF and hit nonblank
- if ($0 != "")
- doingGIF = 0;
-
- # "As is" sections shouldn't contain any other "structure"
- if ($0 ~ /^[ \t]*____[ \t]*$/)
- {
- doneFirstPRELine = 0;
- while ((getline < inputFile) > 0)
- {
- if ($0 ~ /^[ \t]*____[ \t]*$/)
- {
- if (doneFirstPRELine == 1)
- PrintToOutFile("</PRE>");
- break;
- }
- else
- {
- ReplaceSpecialCharacters();
- if (doneFirstPRELine == 1)
- PrintToOutFile($0);
- else
- {
- PrintToOutFile("<PRE>" $0);
- doneFirstPRELine = 1;
- }
- }
- }
- }
- # Remember if list element starting (one or two tabs, bullet)
- else if ($0 ~ /^[\t]+•/)
- {
- if ($0 ~ /^\t•/)
- {
- # We may be starting a brand new list
- if (inList == 0)
- {
- PrintToOutFile("<UL>");
- inList = 1;
- }
- else
- {
- # A new list element ends any subelement
- if (subListElement == 1)
- {
- PrintToOutFile("\t</UL>");
- subListElement = 0;
- }
- }
- listElement = 1;
- # Print the first line of the new list element
- sub(/^[\t]+•[ \t]*/, "");
- ReplaceSpecialCharacters();
- PrintToOutFile("<LI>" $0);
- }
- else if ($0 ~ /^\t\t•/)
- {
- # First subelement, start a new list
- if (subListElement == 0)
- {
- PrintToOutFile("\t<UL>");
- }
- subListElement = 1;
- # Print the first line of the new list subelement
- sub(/^[\t]+•[ \t]*/, "");
- ReplaceSpecialCharacters();
- PrintToOutFile("\t<LI>" $0);
- }
- }
- # A list finishes with a blank line (also signals new paragraph)
- else if ($0 == "")
- {
- if (inList)
- {
- if (subListElement == 1)
- PrintToOutFile("\t</UL>");
- if (listElement == 1)
- PrintToOutFile("</UL>");
- inList = 0;
- listElement = 0;
- subListElement = 0;
- }
- # Print blank lines, unless we're skipping GIF space
- if (doingGIF == 0)
- PrintToOutFile("");
- newParagraphComing = 1;
- }
- # Top level heading H2, between dashed lines
- # This heading starts a new file
- else if ($0 ~ /^-[ \t-]+$/)
- {
- getline < inputFile;
- # Skip the "CONTENTS" section
- SkipUnwantedSections();
- # having trouble with quotes in chapter titles
- SimplifyQuotes();
- ReplaceSpecialCharacters();
- # make sure we can use text as a file name
- gsub(/\t/, " ");
- gsub(/:/, " ");
- StartNewChapter($0);
- TOC[++h2Counter] = $0 SUBSEP;
- getline < inputFile;
- }
- # Second level heading, shown in main contents
- else if ($0 ~ /^§\t/)
- {
- tempLine = substr($0, 3);
- $0 = tempLine;
- ReplaceSpecialCharacters();
- TOC[h2Counter] = TOC[h2Counter] $0 SUBSEP;
- PrintHeading($0, "H3");
- }
- # Second level heading, NOT shown in main contents
- # -- name is preceded with "!" in TOC array to signal that.
- else if ($0 ~ /^\(§\)\t/)
- {
- tempLine = substr($0, 5);
- $0 = tempLine;
- ReplaceSpecialCharacters();
- TOC[h2Counter] = TOC[h2Counter] "!" $0 SUBSEP;
- PrintHeading($0, "H3");
- }
- # Third level heading, not in any contents (although it does have an anchor)
- else if ($0 ~ /^>\t/)
- {
- tempLine = substr($0, 3);
- $0 = tempLine;
- ReplaceSpecialCharacters();
- PrintHeading($0, "H4");
- }
- # GIF entry, <option><space> by itself on a line
- else if ($0 ~ /^ $/)
- {
- PrintGIFTag(gifPartialLocation, gifName[++currentGIF]);
- doingGIF = 1;
- }
- # regular line
- else
- {
- ReplaceSpecialCharacters();
- if (newParagraphComing == 1)
- PrintToOutFile("<P>" $0);
- else
- PrintToOutFile($0);
- newParagraphComing = 0;
- }
- }
- }
-
- # Load AND sort the gif names, to gifName[ 1..numGIFs ].
- # (gif name format is "arbtext#ddddd.gif")
- function LoadGIFNames( x, p, a, b, numSpot, numA, trueP, i)
- {
- numGIFs = 0;
- numA = 0;
- while ((getline x < gifArrayFile) > 0)
- p[++numGIFs] = x;
- for (i = 1; i <= numGIFs; ++i)
- {
- numSpot = match(p[i], /#[0-9]+/);
- # allow other files in folder, or other text in list
- if (numSpot > 0)
- {
- a[++numA] = substr(p[i], numSpot+1, RLENGTH-1);
- trueP[numA] = p[i];
- }
- }
- if (numA+0 > 0)
- {
- sort(a,b,"n");
- for (i = 1; i <= numA; ++i)
- {
- gifName[i] = trueP[b[i]];
- }
- }
- numGIFs = numA;
- }
-
- # Print to specific file, if there is one.
- function PrintToOutFile(s)
- {
- if (outFile != "")
- print s > outFile;
- }
-
- function ReplaceSpecialCharacters()
- {
- gsub(/\&/, ampersand); # keep this one first
- gsub(/</, lessThan);
- gsub(/>/, greaterThan);
- gsub(/«/, euroLeft);
- gsub(/»/, euroRight);
- gsub(/•/, bullet);
- gsub(/¬/, dornk);
- gsub(/§/, section);
- gsub(/¶/, para);
- gsub(/ç/, cedillaC);
- gsub(/—/, shy);
- gsub(/©/, copyright);
- gsub(/®/, registration);
-
- # Question mark gives trouble in anchors, do it too
- #gsub(/\?/, question);
-
- # Short dash -- where is it in the ISO Latin-1 set??
- gsub(/–/, "-");
- # Ditto "…"
- gsub(/…/, "...");
- # And hey, where's ƒ?
- gsub(/ƒ/, "f");
-
- # straighten out the quotes and ticks
- gsub(/“/, "\"");
- gsub(/”/, "\"");
- gsub(/‘/, "'");
- gsub(/’/, "'");
- # do quotes last since we may have generated some new ones
- gsub(/"/, quote);
- }
-
- function SimplifyQuotes()
- {
- gsub(/“/, "'");
- gsub(/”/, "'");
- gsub(/‘/, "'");
- gsub(/’/, "'");
- gsub(/"/, "'");
- }
-
- # Skip the "CONTENTS" section.
- function SkipUnwantedSections()
- {
- if ($0 ~ /^[ \t]*CONTENTS[ \t]*$/)
- {
- getline < inputFile;
- while ((getline < inputFile) > 0)
- {
- if ($0 ~ /^-[ \t-]+$/)
- {
- getline < inputFile;
- break;
- }
- }
- }
- }
-
- # Finish writing current chapter, if any. Start a new temp file for
- # next chapter (tack "x" onto chapter name for the temp version)
- # and pump out the starting HTML.
- function StartNewChapter(chapterName, truncatedName, nameLength)
- {
- if (outFile != "")
- FinishCurrentChapter();
- truncatedName = TempFileNameForChapter(chapterName);
- outFile = chaptersFolder truncatedName;
- StartHTML(chapterName);
- }
-
- function TempFileNameForChapter(chapterName, fileName, nameLength)
- {
- fileName = chapterName ".html" "x";
- nameLength = length(fileName);
- if (nameLength > 31)
- fileName = substr(chapterName, 1, 25) ".html" "x";
- return fileName;
- }
-
- function FileNameForChapter(chapterName, fileName, tempName, nameLength)
- {
- tempName = TempFileNameForChapter(chapterName);
- nameLength = length(tempName);
- fileName = substr(tempName, 1, nameLength - 1);
- return fileName;
- }
-
- # Close temp file for chapter; copy it to final version, inserting
- # TOC at top; delete temp file.
- function FinishCurrentChapter( nameLength)
- {
- PrintToOutFile("<P>");
- DoChapterTOC();
- EndHTML();
- close(outFile);
- oldOutFile = outFile;
- nameLength = length(outFile);
- outFile = substr(outFile, 1, nameLength - 1);
- WriteFinalChapter();
- close(outFile);
- close(oldOutFile);
- remove(oldOutFile);
- # temporarily, we have no outFile to write to
- outFile = "";
- }
-
- function StartHTML(chapterName)
- {
- PrintToOutFile("<HTML>");
- PrintToOutFile("");
- PrintToOutFile("<HEAD>");
- PrintToOutFile("<TITLE>" chapterName "</TITLE>");
- PrintToOutFile("</HEAD>");
- PrintToOutFile("");
- PrintToOutFile("<BODY>");
- PrintToOutFile("<H1>" chapterName "</H1>");
- PrintToOutFile("<HR>");
- PrintToOutFile("");
- PrintToOutFile(contentsMarker); # table of contents goes here on 2nd pass
- }
-
- function EndHTML()
- {
- PrintToOutFile("");
- PrintToOutFile("</BODY>");
- PrintToOutFile("");
- PrintToOutFile("</HTML>");
- PrintToOutFile("");
- PrintToOutFile("");
- }
-
- # Print a heading and named anchor. "level" should be "H2", "H3" etc.
- # Having trouble with "?" in name, so leave it out of anchor name.
- function PrintHeading(name, level)
- {
- PrintToOutFile("<" level "><A NAME = \"" NoQuestionVersionOf(name) "\" >" name "</A></" level ">");
- }
-
- function NoQuestionVersionOf(name)
- {
- gsub(/\?/, "", name);
- return name;
- }
-
- # All pictures are in ":Graphics:" beside ":Text:", and so "theLocation"
- # says go one level up and then down into Text.
- function PrintGIFTag(theLocation, theGIFName, copyOfName)
- {
- PrintToOutFile("<P>");
- # One little wrinkle, turn the "#" in name into "%23"
- copyOfName = theGIFName;
- sub(/#/, "%23", copyOfName);
- PrintToOutFile("<IMG SRC=\"" theLocation copyOfName "\" ALIGN = \"top\">");
- PrintToOutFile("<P>");
- }
-
- # The main table of contents.
- # We just print two levels. Additional levels would probably need additional
- # counters h3Counter, h4Counter etc and additional arrays.
- function DoTOC( i, j, numSubHeads, contents, showSubs)
- {
- PrintToOutFile("<H2><A NAME = \"Table of Contents\">" " Table of Contents " "</A></H2>");
- PrintToOutFile("<UL>");
- for (i = 1; i <= h2Counter; ++i)
- {
- numSubHeads = split(TOC[i], contents, SUBSEP);
- if (contents[numSubHeads] == "")
- --numSubHeads;
-
- # Print the main heading, href is file corresponding to chapter
- PrintToOutFile("<LI> <A HREF = \"Text/" FileNameForChapter(contents[1]) "\"> " contents[1] " </A>");
-
- # then print the subheadings
- if (numSubHeads > 1)
- {
- # Check there are some headings to show - don't show if name starts with "!"
- showSubs = 0;
- for (j = 2; j <= numSubHeads; ++j)
- {
- if (index(contents[j], "!") != 1)
- {
- showSubs = 1;
- break;
- }
- }
- if (showSubs == 1)
- {
- PrintToOutFile("\t<UL>");
- for (j = 2; j <= numSubHeads; ++j)
- {
- # href consists of location, file name (from chapter name), "#", subsection name
- if (index(contents[j], "!") != 1)
- PrintToOutFile("\t<LI> <A HREF = \"Text/" FileNameForChapter(contents[1]) "#" NoQuestionVersionOf(contents[j]) "\"> " contents[j] " </A>");
- }
- PrintToOutFile("\t</UL>");
- }
- }
- }
- PrintToOutFile("</UL>");
- }
-
- function DoChapterTOC( i, j, numSubHeads, contents)
- {
- # Print link to main table of contents
- PrintToOutFile("<A HREF = \"../" contentsFileName "#Table of Contents\">Main Contents</A>");
-
- # Print chapter's table of contents
- i = h2Counter;
- numSubHeads = split(TOC[i], contents, SUBSEP);
- if (contents[numSubHeads] == "")
- --numSubHeads;
- # Print the main heading
- PrintToOutFile("<H2> <A HREF = \"#" contents[1] "\"> " contents[1] " </A></H2>");
- if (numSubHeads > 1)
- {
- PrintToOutFile("\t<UL>");
- for (j = 2; j <= numSubHeads; ++j)
- {
- # Trim any leading "!"
- if (index(contents[j], "!") == 1)
- contents[j] = substr(contents[j], 2);
- PrintToOutFile("\t<LI> <A HREF = \"#" NoQuestionVersionOf(contents[j]) "\"> " contents[j] " </A>");
- }
- PrintToOutFile("\t</UL>");
- }
- }
-
- function WriteFinalChapter( haveSeenContents)
- {
- haveSeenContents = 0; #speed things up with a simple "boolean"
-
- # Get lines from oldOutFile to the variable line.
- while (getline line < oldOutFile > 0)
- {
- if (haveSeenContents == 0 && line ~ contentsMarker)
- {
- DoChapterTOC();
- PrintToOutFile("");
- haveSeenContents = 1;
- }
- else
- PrintToOutFile(line);
- }
- }
-
- # Write the main file, table of contents at one level above the chapter documents.
- # Finish any open chapter first.
- function WriteMainContentsFile()
- {
- if (outFile != "")
- FinishCurrentChapter();
-
- outFile = contentsFileLocation contentsFileName;
-
- PrintToOutFile("<HTML>");
- PrintToOutFile("");
- PrintToOutFile("<HEAD>");
- PrintToOutFile("<TITLE>" theMainTitle "</TITLE>");
- PrintToOutFile("</HEAD>");
- PrintToOutFile("");
- PrintToOutFile("<BODY>");
- PrintToOutFile("<H1>" theMainHead "</H1>");
- PrintToOutFile("<HR>");
- PrintToOutFile("");
-
- DoTOC();
-
- PrintToOutFile("");
- PrintToOutFile("</BODY>");
- PrintToOutFile("");
- PrintToOutFile("</HTML>");
- PrintToOutFile("");
- PrintToOutFile("");
-
- close(outFile);
- }
-
- # Working within the current bounds of hAWK, we can (just barely) persuade
- # a folder to come into existence by using "copy", which creates folders
- # along the specified path if possible. So we make a file, copy it to the
- # folder we want to exist, and then remove both versions of the file. Ugh.
- function MakeFolder(folderPathName, xFile, xFileSource, xFileDest)
- {
- xFile = "Temp1342134HIKE!";
- xFileSource = STDPATH xFile;
- xFileDest = folderPathName xFile;
- print "Hello" > xFileSource;
- close(xFileSource);
- copy(xFileSource, xFileDest);
- remove(xFileSource);
- remove(xFileDest);
- }
-